Distributed caching system

ABSTRACT

Embodiments of a distributed caching system are disclosed that cache data across multiple computing devices on a network. In one embodiment, a first caching system serves as a caching front-end to a distributed cluster of additional caching systems. The caching systems may be spread over multiple partition groups. In one embodiment, cache writes at a cache system in one partition group are distributed to other partition groups. By propagating the cache writes across multiple partition groups, the caches at the different partition groups include more recently accessed data, thereby increasing the likelihood of cache hits.

BACKGROUND

Caching is computer technology that allows computer processes to beaccelerated. Generally, a computer uses multiple levels of memory typesof varying sizes and speeds, with cheaper memory being generally slowerand bigger in storage size and more expensive memory being generallyfaster and smaller in storage size. As faster memory is generally smallin storage size, only a limited amount of data can be stored on thefaster memory types in the computer. Generally described, cachingattempts to anticipate what data items are needed by the computer in thefuture and attempts to keep those data items in the limited amounts ofthe faster memory types in order to improve performance by reducingaccess times to the data items. These data items can be maintained in acache data structure in the computer memory.

Generally described, a cache (e.g., an application cache) includes adata structure that transparently stores data so that future requestsfor that data can be served faster. The data that is stored within acache may include values that have been computed earlier or duplicatesof original values that are stored elsewhere. If requested data iscontained in the cache (e.g., cache hit), the request can be served byreading the cache instead of calculating the data or retrieving the datafrom elsewhere. If reading from the cache is faster than calculating orretrieving the data, then the request is served faster. Otherwise (e.g.,cache miss), the data has to be recomputed or fetched from its originalstorage location, which can be comparatively slower. Generally, thegreater the number of requests that can be served from the cache, thefaster the overall system performance.

Generally, cache sizes are small relative to bulk computer storage.Nevertheless, caches have proven themselves useful in many areas ofcomputing because access patterns in typical computer applications havelocality of reference. For example, references exhibit temporal localityif data is requested again that has been recently requested already. Inanother example, references exhibit spatial locality if data isrequested that is physically stored close to data that has beenrequested already. Thus, caches can be beneficial, despite being able tofit only a portion of data stored in the bulk computer storage.

BRIEF DESCRIPTION OF DRAWINGS

Throughout the drawings, reference numbers may be re-used to indicatecorrespondence between referenced elements. The drawings are provided toillustrate example embodiments described herein and are not intended tolimit the scope of the disclosure.

FIG. 1 is a block diagram schematically illustrating an embodiment of adistributed caching system for caching data across multiple computingdevices.

FIG. 2A schematically illustrates an embodiment of the front-end systemof FIG. 1.

FIG. 2B schematically illustrates an example data flow between anembodiment of the front-end system of FIG. 2B and the client devicesduring write operations.

FIGS. 3A and 3B schematically illustrates embodiments of the data flowbetween various components of the distributed caching system of FIG. 1.

FIG. 4 schematically illustrates a logical flow diagram for a process tolookup data in the distributed caching system of FIG. 1.

FIG. 5 illustrates a distributed services system that utilizes anembodiment of the distributed cache system of FIG. 1.

FIG. 6 illustrates an embodiment of the data flow in the distributedservices system of FIG. 5 during a TLS authentication.

FIG. 7 schematically illustrates a logical flow diagram for a process tolookup data in the distributed services system of FIG. 5.

FIG. 8 schematically illustrates an embodiment of a data flow in adistributed caching system configured to enable resumption of securityhandshake transactions.

FIG. 9 illustrates an embodiment of a data flow in a distributed cachingsystem having caching systems distributed across multiple data centersor other partition groups.

DETAILED DESCRIPTION

Overview

Accessing slower memory types can cause computer processor unit (CPU)delay as the CPU waits for data items from the slower memory types tobecome available. This delay can be characterized as an expense thatslows down the performance of a first computing system. One possiblesolution to this problem involves caching data items on faster memorytypes. However, as storage sizes of the faster memory types are limited,some searches for data items in the cache can result in a cache hit withthe data item found and some searches can result in a cache miss, wherethe data item is not found. Thus, increasing the chances of the dataitem being found in the cache (sometimes referred to as the hit rate)can improve the performance of the computing system by reducing delays.For example, the hit rate can be increased by making available to thefirst computing system caches found on other computing systems on anetwork, thereby effectively increasing the storage size of the cacheavailable to the first computing system and increasing the chance of adata item being found in the expanded, distributed cache.

Embodiments of a distributed caching system (DCS) are disclosed thatcache data items across multiple computing devices on a network. In oneembodiment, a first cache system (sometimes referred to herein as acaching system) of the DCS serves as a caching front-end or clientinterface to a distributed cluster of the other cache systems in theDCS, where the first cache system can receive and distribute cacherequests to the additional cache systems, as needed. The first cachesystem can also serve as a cache server itself, by storing data items onits own internal cache. For example, the first cache system can firstattempt to find a requested data item on the internal cache, but, if thelookup results in a cache miss, the first cache system can search theadditional cache systems for the data item. In some embodiments, thefirst cache system is configured to multiplex requests to eachadditional cache system over a single Transmission Control Protocol(TCP) socket, which allows for network efficiencies and faster detectionof failure. In some embodiments, the first cache system is configured toidentify additional requests for the first data item and duplicate therequested data item in order to respond to the additional requests,which allows for greater responsiveness to requests.

In some embodiments, the distributed caching system is configured tostore session state identifiers in a networked cache, while dynamicallyallocating requests to servers. Client devices can then resume securesessions even if assigned to new servers as the new servers can obtainthe session state identifiers from the distributed caching system. In atleast some cases, the client device can be authenticated without theserver having to perform a full authentication, thereby reducing theworkload of the server and decreasing latency as the server can respondfaster.

In some embodiments, the cache systems of the distributed cachingsystems are spread over multiple data centers. In one embodiment, cachewrites at a cache system in one data center are distributed to otherdata centers. By propagating the cache writes across multiple datacenters, the caches at the different data centers include more recentlyaccessed data, thereby increasing the likelihood of cache hits.

Various aspects of the disclosure will now be described with regard tocertain examples and embodiments, which are intended to illustrate butnot to limit the disclosure. Nothing in this disclosure is intended toimply that any particular feature or characteristic of the disclosedembodiments is essential. The scope of protection of certain inventionsis defined by the claims.

Examples of a Distributed Caching System

FIG. 1 is a block diagram schematically illustrating an embodiment of adistributed caching system 100 for caching data items across multiplecomputing devices (e.g., servers, other computers, etc.). Thedistributed caching system 100 can include one or more computing devicescomprising processor(s), computer memory, network interface(s) and/ormass storage. The distributed caching system 100 can also include othercomputing systems, with each sub-system including one or more computingdevices. Some or all of the computing devices or systems of thedistributed caching system 100 can include a local cache. Thedistributed caching system 100 can include physical computing systemsand/or virtual computing systems operating on physical computingsystems. For example, the distributed caching system 100, in oneembodiment, can be implemented as a group of virtual machine instancesoperating on computers in a data center.

Generally, caches include copies of data items that are kept temporarilyin the cache but that are primarily stored, more persistently, elsewhereon a primary data storage device or devices. Caches are usuallysignificantly smaller than these primary data storage devices and thuscannot fit all the data items from the primary data storage device.Caches are also usually operating on cache memory that is significantlyfaster than the primary data storage device. Therefore, caches typicallyattempt to include the most used data items from the primary datastorage in order to improve the performance of a computing system.

In some embodiments, the distributed caching system 100 is logicallypositioned between various computing services and one or more clientdevices 125 connected via a network 120 a. For example, the distributedcaching system 100 may provide caching for a web server or othercomputing service that processes request from the client devices. Thedistributed caching system can receive data item requests from theclient devices, for example, on a network interface (e.g., Ethernet orother network interface card (NIC), 802.11a/b/g/n receiver, etc.) andprocess those requests. In one example, the client devices 125 operateweb browsers that request web page data items from the distributedcaching system 100. In one embodiment, the distributed caching systemrequests and/or pre-processes web pages on behalf of the web browserusers. The distributed caching system 100 may be part of a cloudcomputing service provider that hosts many distinct web sites of manydifferent entities, and the distributed caching system caches data itemsfor those distinct web sites.

The distributed caching system 100 can cache data items from a varietyof primary data sources, where the data items are primarily stored. Forexample, data items can be primarily stored on one or more mass storagedevices on the distributed caching system. Such mass storage devices caninclude mechanical drives (e.g., hard disk drives, optical disk, tape),solid state drives (SSDs) that use integrated circuit assemblies asmemory to store data items persistently, or combinations of the above(e.g., hybrid drives). Cached data items can also be primarily stored onexternal systems. In some embodiments, the distributed caching system100 is connected, via a network 120 b, to one or more servers,databases, and/or other data repositories where the primary copies ofthe data items are stored. For example, in the case of a cache miss, thedistributed caching system 100 can search for the data items on othercomputing devices on the network 120 b.

In addition to caching, individual computers used by the distributedcaching system 100 can also be used to provide other functions andservices, such as running virtual machine instances, providing computingresources such as computing power and storage, running one or more webservers and/or providing other computing services. For example, the samecomputer may provide caching services as well as web server services.

In one embodiment, a front-end system 102 for the distributed cachingsystem 100 accepts requests on behalf of the distributed caching system100, attempting to fulfill the request and/or finding an external cache(e.g., located on a separate computing device) that contains therequested data items. The front-end system 102 can be connected via afirst network 120 a (e.g., the Internet) to one or more client devices125. The front-end system 102 can also be connected, via a secondnetwork 120 b (e.g., a data center or private network), to one or moreexternal caches 145 a-c operating on one or more back-end cache systems150 a-c.

In some embodiments, multiple or each of the computing devices of thedistributed caching system 100 can act as a front-end system 102. Forexample, a load-balancer or work distribution module may lie between thedistributed caching system 100 and the client devices 125. Theload-balancer may receive the requests from the client devices andassign them to a selected one of the computing devices of thedistributed caching system 100 through a selection algorithm (e.g.,round-robin, random, hashing, etc.). Once the selected computing devicereceives the request, it can attempt to fulfill the request using itsinternal cache, and, if the data items are not on the cache, theselected computing device can search on the network 120 b for acomputing device that contains the requested data items. FIG. 5describes such embodiments in additional detail.

The distributed caching system 100 can support multiple levels of cachelookup. For example, the distributed caching system 100 can providemultiple levels of application caches to applications running onapplication servers. In one two-level embodiment, the first levelinvolves the front-end system 102 performing a local lookup of therequested data items. If the lookup results in a cache miss, thefront-end system can proceed to the second level by searching on thenetwork at the additional external caches 145 a-c for the data items. Insome embodiments, the first cache system is configured to multiplexrequests to each external cache system over a single TCP socket or othernegotiated streaming protocol connection (e.g., Stream ControlTransmission Protocol (SCTP), Unix domain sockets or other inter-processcommunication (IPC) socket, Synchronous Link Control (SLC), or thelike), which allows for network efficiencies and faster detection offailure. In one embodiment, requests are multiplexed over a firstconnection protocol (e.g., Unix domain sockets or IPC sockets) for localcommunications in the front-end system 102 and a second connectionprotocol (e.g., TCP) for network communications.

As discussed above, the front-end system 102 can act as a cache server,responding to requests using data items from its own internal cache. Thefront-end system 102 may also act as a cache proxy, receiving and/ormanaging requests on behalf of the distributed caching system 100. Inone embodiment, the front-end system 102 distributes requests amongremaining systems in the distributed caching system 100.

In one embodiment, the front-end system 102 uses a hash function toconsistently select a particular back-end cache system to handle a givendata item. For example, the front-end system 102 can determine orgenerate a key from the request. It can then apply a hash function tothe key to determine a particular back-end cache system to send therequest to. By applying the hash function, the front-end system canconsistently identify the particular back-end cache system whenever thesame data items are requested. For example, if a client device 125 makesa first request for a first data item, the front-end system 102 canselect a first back-end system to handle the request. If the clientdevice (or another client device) makes a second request for the samefirst data item, the front-end system 102 can select the same firstback-end system to handle the request. As the first back-end systemhandled the first request, the first-back-end will likely have the firstdata item in its cache and will not have to obtain the data item fromprimary storage, thereby speeding up the response time.

Consistent hashing is one example hashing algorithm that can be used bythe distributed caching system 100 to map keys or data items to cachesystems. In consistent hashing, the mapping of keys to slots (e.g.,cache systems) remains largely static even if the number of slotschanges (increased or decreases). For example, when the number of slotschanges, on average, only K/n keys need to be remapped, where K is thenumber of keys and n is the number of slots. Consistent hashing is basedon mapping each object to a point on the edge of a circle (orequivalently, mapping each object to a real angle). Each availablemachine (or other storage slot) is mapped to many pseudo-randomlydistributed points on the edge of the same circle. To find where anobject should be placed, a system using consistent hashing finds thelocation of that object's key on the edge of the circle; then walksaround the circle until falling into the first slot it encounters (orequivalently, the first available slot with a higher angle). The resultis that each slot contains all the resources located between its pointand the next slot point. Consistent hashing is useful for thedistributed caching system 100 as the number of cache systems can changebased on failures and/or recoveries of the cache systems. Consistenthashing allows the distributed caching system 100 to account for thosechanges efficiently.

Referring back to the embodiment with a dynamically determined front-endsystem 102 described above, where the multiple systems of thedistributed caching system 100 are capable of acting as a front-end, inone embodiment, each front-end candidate system uses the same hashingfunction to determine which back-end system is assigned to handle arequest. For example, if a first front-end system receives a firstrequest for the first data item, it selects a first back-end cachesystem. If a different front-end system receives a second request forthe same first data item, it selects the same first back-end cachesystem. By consistently selecting back-end cache systems, the hit rateof the distributed caching system 100 will likely increase because thesame back-end cache system is handling the same requested data item,regardless of from which front-end system on the distributed cachingsystem 100 the request is received.

In some embodiments, TCP is used by the distributed caching system 100to communicate between its components. TCP includes flow-control andcongestion control properties that are desirable in a distributedcaching architecture, given the large number of hosts that can besharing a single instance (e.g., 2,000+). Flow control can be animportant part of high-throughput request-reply systems and can enablegraceful slowdown in the face of overload. Lack of proper end-to-endflow control is a common cause for complete failure in the face of hightraffic. For example, in the absence of flow-control, a few misbehavingclients may have the ability to take down an entire fleet of cachesystems or otherwise saturate the network 120 b. While the distributedcaching system 100 uses TCP in some embodiments, in other embodiments,the distributed caching system 100 can use other protocols, such as UserDatagram Protocol (UDP), for its communications.

In one embodiment, the distributed caching system 100 implements a deadend-point detection mechanism based on the notion of inactivity. Forexample, a connection end-point may be considered inactive if no newtraffic is received after a specified amount of time. This allows thedistributed caching system 100 to shut-down inactive TCP end-points,thereby increasing the number of TCP end-points the distributed cachingsystem 100 can handle before running into TCP connection limits. Onebenefit of such an embodiment is more robustness in the case of SYN(synchronize message) floods; such an embodiment can shut downconnections, thereby making it more difficult for the SYN flood tosaturate the network.

In one embodiment, the determination of inactivity of a TCP end-point isbased at least partly on how the TCP connection is used. For example, ifa TCP connection is being used to request or respond to a large datarequest, the time-out before endpoints of that TCP connection areconsidered inactive may be longer than if the TCP connection is beingused for a small data request.

Various embodiments of the distributed caching system 100 may implementadditional features. For example, the distributed caching system mayallow randomization of retries of transactions. This can reducecontention or collisions in the network. In one embodiment, thedistributed caching system 100 implements a stronger flow-controlprocess between the client and the responding system. For example,clients read and then write. Servers write and then read. The TCPwindows allow both the clients and servers to behave in a pipelinedfashion, thereby increasing network efficiency.

In-Memory Caching

In one embodiment, each computing system (or some of the systems) in thedistributed caching system 100 uses a caching structure to provide anin-memory cache to users of each computer. In one embodiment, thein-memory cache is configured to provide high memory efficiency.Typically, random access memory (RAM) is a limited resource on computingdevices, particularly for computing devices that provide services tomultiple users (e.g., multiple web server instances). A more memoryefficient in-memory cache can result in fewer network calls compared toa less memory efficient cache. For example, higher memory efficiency canallow more data items to be stored on the in-memory cache, increasingthe hit rate of the cache and thereby reducing the need to searchexternal caches on the network for the data items. Typically, localmemory access is significantly faster accessing memory over the network,thus, avoiding network calls tends to increased performance.

The caching structure can be maintained on the memory of the computingsystem. The memory can include processor cache (e.g., L1, L2, L3, etc.)and/or main memory (e.g., DDR RAM, SDRAM, other types of RAM, etc.). Insome embodiments, the caching structure uses multiple levels of caches,with small fast caches backed up by larger slower caches. For example,the caching structure may operate by checking a smallest level 1 (L1)cache first; if it hits, the processor proceeds at high speed. If thesmaller cache misses, the next larger cache (L2) is checked, and so on,before main memory is checked. In some embodiments, the cachingstructure uses only a single cache memory type.

While some embodiments use a slab allocation technique or a best fitallocation technique to assign storage space, in one embodiment, thecaching structure utilizes small, fixed size storage blocks for storingdata items. By using small, fixed size storage blocks, more data itemscan be stored on the same size of cache memory through higherefficiency. Generally, if a stored data item does not fill a storageblock, the extra space is not used and is considered “wasted.” Forexample, if a first cache uses 128 KB storage blocks and a second cacheuses 1024 KB cache blocks, then if the caches store a 128 KB block, thefirst cache will have 0 (128 KB−128 KB=0) wasted space (100%efficiency), while the second cache will have 896 KB (1024 KB−128 KB=896KB) of wasted space (12.5% efficiency). The efficiency advantage ofusing smaller storage blocks can also extend to larger stored dataitems. Generally, a data item can be subdivided into chunks equal to theblock size, with any remainder data taking up a last storage block. Ifthe last storage block is not filled, then space on the last block iswasted. Going back to the above example, if the caches stores a 600 KBdata item, the first cache uses five storage blocks with 40 KB (5*128KB−600 KB=40 KB) of wasted space (93.75% efficiency), while the secondcache uses one storage block with 400 KB (1024 KB−600 KB=400 KB) ofwasted space (58.6% efficiency).

However, using smaller blocks can increase access times for the cache.For example, referring to the above example for the 600 KB data item,five blocks need to be accessed from the first cache while only oneblock needs to be accessed on the second cache. Assuming the accesstimes are roughly similar for different size blocks, the access time forthe first cache for the 600 KB data item may be up to 5 times longerthan the second cache. In some cases, accessing multiple blocks can beeven more expensive if the blocks are stored in the memory cache innon-contiguous blocks. Nevertheless, the first cache may still be fasteron average even if the access times are longer per operation if thecache hit rate of the first cache is sufficiently higher than the cachehit rate of the second cache. For example, assume the first cache has anaccess time of 5X and a hit rate of 90% while the second cache has anaccess time of X and a hit rate of 80% and the cost of a cache miss is100X, where X is some arbitrary period of time. Then, for 100 requests,the first cache, on average, will take 14.5X to serve each request(5X*90%+100X*10%=14.5X). Meanwhile, the second cache, on average, willtake 20.8X (X*80%+100X*20%=20.8X). Thus, depending on the circumstances,higher cache access times from using smaller block sizes can be morethan offset by higher hit rates because of the generally significantlyhigher costs of accessing the slower, primary storage device where thedata items are primarily stored.

The caching structure can implement various cache algorithms (alsocalled replacement algorithms or replacement policies) to manage a cacheof information stored on the computer. When the cache is full, thealgorithm chooses which items to discard to make room for the new dataitems.

In some embodiments, the caching structure can use a cache algorithm todetermine which data items to discard from the cache. Some example cachealgorithms include: least recently used (LRU), most recently used (MRU),pseudo-LRU, random replacement, segmented LRU (SLRU), 2-way setassociative, direct-mapped cache, adaptive replacement cache, clock withadaptive replacement and multi queue caching.

As will be recognized, the arrangement shown in FIG. 1 represents justone of many possible ways that the distributed caching system 100 may bearranged in a network. For example, the illustrated networks 120 a, 120b may be different networks or part of the same network (e.g., theInternet). In one embodiment, the first network 120 a is a publicnetwork while the second network 120 b is a private network or viceversa. For ease of reference, the disclosure generally uses the term“network 120” to refer to either or both networks 120 a, 120 b.

The client computing devices 125 may include, for example, personalcomputers (PCs), personal digital assistants (PDAs), cellulartelephones, laptops, tablets, e-book readers and other types of devicesthat support web browsing or otherwise access online computing services.For example, the client devices may be used by users to connect tovarious types of web sites, such as sites related to shopping, news,organizations, sports, games, product reviews or the like. These websites may be hosted by various web servers.

As discussed above, data items can be primarily stored and cached from avariety of data repositories that can be local to components of thedistributed caching system 100 or can be on networked or distributedsystems. The data repositories may be implemented using any type ortypes of physical computer storage. For example, such data repositoriescan include magnetic hard drives, solid state drives or memory, opticaldisc and/or the like. Various data structures can be used to store thedata items, such as electronic files, databases or other datastructures.

Examples of Caching Systems

FIG. 2A schematically illustrates an embodiment of the front-end system102 of FIG. 1. In the illustrated embodiment, the front-end system is incommunication with multiple client devices 125 a-c and multiple externalcaches 145 a-c operating on multiple back-end cache systems 150 a-c.

In one embodiment, the front end system 102 includes a server TCP writelock 205 for managing access to connections between client devices andserver threads 210 on the front-end system. In one embodiment, a lock(also called a mutex) is a synchronization mechanism for enforcinglimits on access to a resource in an environment where there are manythreads of execution. A lock can be used to enforce a mutual exclusionconcurrency control policy. In one embodiment, reading ends of a serverTCP end point are acted on by only a single event loop while writing endpoints may be acted on by multiple loops or threads.

In some embodiments, the server threads 210 receive cached data from thedistributed caching system 100 and communicate the requested cached datato the client devices 125 a-c. The front end system 102 may be runningmultiple event loops 215 a-c or threads to provide these cachingservices. In addition, in some embodiments, the front end system 102 canprovide various services to users in addition to caching services. Forexample, such services can include storage of data, performingcomputations, serving web pages or the like. Some of the server threads210 may be used to provide these services. The services may be differentservices or instances of the same service.

In some embodiment, the event loops 215 a-c are I/O event driven loops.In one embodiment, the front-end system 102 stripes establishedconnections over a small set of threads, each thread running its ownprivate event loop. For example, the system 102 can associate a set ofmultiple threads to multiple network connections. When a request arriveson one of the network connections, the request can be assigned to one ofthe associated multiple threads. This design reduces the number ofwakeups needed to process a single request, though it may increase thenumber of wakeups that the operating system kernel needs to perform fora small number of concurrent requests. When processing a large number ofconcurrent requests however, the number of wakeups would be lower withthe above configuration. Another advantage of the above configuration isthat the state machine that represents a single client is not requiredto be thread-safe. Buffers also do not migrate between threads, whichdepending on the memory allocator in use, can result in significantreductions in heap growth and allocation time.

In one embodiment, the front-end system uses a custom event dispatchloop. In one embodiment, each event dispatch loop is bound to a threadand sockets hashed to one of many loops. Well behaving handlers have theability to yield the event loop to prevent starvation of requests fromother sockets. In some embodiments, the custom event dispatch loop isconfigured to handle socket events, signals and/or timers.

In some embodiments, the services are performed by server threads 210 orother server process (e.g., an application or virtual machine instance).In one embodiment, in order to minimize memory contention, server eventhandlers are only invoked via a single thread. Additionally, run-timestatistics can be maintained using thread-local counters. In oneembodiment, the server and client interaction are designed to maximizeor increase flow control effectiveness using the techniques describedabove. The server process can include command parsers configured tominimize or reduce buffer allocation and copies. For each client, theserver process can dedicate a small user-space queue that helps mitigatesocket contention on the writing end. If the queue is full then theevent handler for the socket can yield, so that other clients may beserviced. A yielded event handler can be restarted after all otherevents have been processed.

As discussed above, the front-end system 102 can include an internalin-memory cache 220 for storing data items on the front-end system. Whenserver threads 210 need access to data items to fulfill a request, theycan check the cache 220 to see if the data items have already beenstored there. In the case of a cache miss, the system 102 can check theback-end cache systems of the distributed caching system 100 for thedata items. As even networked caches are generally faster than accessinga primary storage device (e.g., a hard drive or database), the system102 can improve performance by attempting to use resources from theexternal caches 145 a-c, if available.

In one embodiment, the front-end system 102 includes a DCS client 225that handles requests that resulted in a cache miss in the internalcache 220. The DCS client 225 can identify a backend cache system onwhich the requested data items may be stored. In one embodiment, the DCSclient 225 generates or determines a key based on the requested dataitem and uses a hash function to identify a candidate backend systemthat may contain the requested data item. In one embodiment, the DCSclient includes or has access to a list of backend cache systems on thedistributed caching system 100, including the network location (e.g., IPaddresses) of the backend cache systems.

In some cases, the candidate backend system may not store the data itembecause of data time outs, if the data item has not been requestedbefore, if the data item was displaced from the cache by newer dataitems, etc. In those cases, the network cache request results in anothercache miss and the front-end system 102 can obtain the requested dataitem from a primary storage device. However, if the candidate backendsystem does include the data item, then the candidate backend system canreturn the data item from its internal cache (e.g., a cache hit) to thefront-end system.

The DCS client 225 may take advantage of the separation between thereading and writing halves of a TCP socket when it dispatches and/orreceives response from the backend cache systems. As discussed above, inone embodiment, any requests that cannot be fulfilled by the local cache220 are delegated to the locally running DCS client 225. The front-endsystem 102 can continue to serve requests that have arrived after thedelegated request. Once a reply arrives, the DCS client can directlywrite the reply to the writing half of the originating client's TCPsocket, as well as save the reply it to the local cache 220. In oneembodiment, if the originating client disconnects, the system 102 stillsaves the result to the local cache 220 so that it can be available forfuture transactions, such as if the originating client reconnects.

In one embodiment, the DCS client 225 uses a single TCP socket perend-point. For example, the distributed caching system 100 can multiplexmultiple requests to a back-end cache system over the single TCP socket.For example, when the client 225 is processing a first request, it candetermine if other requests are pending for the same back-end cachesystem and aggregate that request over the single TCP socket. By doingso, the DCS client 225 can reduce the number of active connections anysingle front-end or back-end cache system would need to handle. Asconcurrent requests to the same end-point can be aggregated, this canalso reduce the number of outbound network packets, reducing load on thenetwork. In addition, using a single TCP socket can result in faster TCPramp-up and potentially offers better network utilization by allowingmore aggressively pipelined requests on the fly. In cases where thesystem 102 provides virtual machine instances, having a single TCPsocket per end-point frees users of the virtual machine instances fromhaving to manually tune the size of the connection pool.

The DCS client 225 can use a callback model to notify callers ofcompletion of requests. In some cases, aggregated requests over a singleconnection may be received and processed in a different order than whensent. Requests can be written directly to the socket by the DCS clientif the socket is writable and an event loop 215 a-c is not concurrentlywriting. This can save the delay incurred from waking an event loop towrite the request to the TCP socket connected to the requesting clientdevice.

The front end system 102 can also include a client TCP write lock 230for controlling access to TCP connections between the backend cachesystems 150 a-c and the front-end system 102. In one embodiment, thefront end system 102 shuts down TCP connections (e.g., to client devicesor backend systems) that are deemed inactive, in order to control theamount of active TCP connections existing concurrently. Many operatingsystems have limits for the number of active TCP connections, and bymanaging the connections, the front-end system 102 can enhance itsresponsiveness to requests (e.g., by not having to wait for TCPconnections to become available if the limit is reached).

FIG. 2B schematically illustrates an example data flow between anembodiment of the front-end system 102 of FIG. 2B and the client devices125 a-c during write operations. In the illustrated figure, clientdevices and the respective TCP connections associated with the clientdevices are associated with a particular server TCP write lock. Forexample, in FIG. 2B, lock 205 a is associated with client device 125 a,lock 205 b with client device 125 b and lock 205 c with client device125 c. In one embodiment, in order for an event loop to communicate withthe particular client device by writing on the TCP connection, the eventloop first acquires the write lock associated with the particular clientdevice. In some embodiments, while writing to the connections from theserver side requires the use of a server write lock 205, client devicescan freely write to the connections without using locks.

As discussed above, in some embodiments, in order to improve throughput,a set of event loops are assigned or striped to a particular clientdevice 125 a. When a data item is received from the distributed cachingsystem 100 that needs to be sent to the client device 125 a, one of theevent loops can be dynamically assigned to process the data item. Forexample, if other event loops are busy servicing other clients, theleast busy event loop 215 a may be selected. The even loop 215 a canthen acquire the server lock 205 a associated with the client device 125a so that another event loop does not attempt to send the same data itemto the client device 125 a. The use of the server write locks can ensurethat effort between the event loops are not duplicated. In someembodiments, the client TCP write locks 230 of FIG. 2A also serve asimilar purpose for connections with the back-end cache systems 145 a-c.

Examples of Caching Processes

FIGS. 3A and 3B schematically illustrates embodiments of the data flowbetween various components of the distributed caching system 100. FIG.3A illustrates an in-memory cache hit event. At event 1, the clientdevice 125 sends a request to the front-end system 102. At event 2, thefront-end system 102 searches and finds the requested data item in itsinternal cache. At event 3, the front-end system provides the requesteddata item to the client device 125.

FIG. 3B illustrates an in-memory cache hit miss event that results in acache hit on a networked cache. At event 1, the client device 125 sendsa request to the front-end system 102. At event 2, the front-end system102 searches and fails to find the requested data item in its internalcache. It then identifies an external cache 145 a designated to handlethat request and sends the external cache 145 a the data item request.At event 3, the external cache 145 a searches and finds the requesteddata item in its internal cache. It then sends the requested data itemto the front-end system 102. At event 4, the front-end system 102 savesthe data item to its own cache and sends the requested data item to theclient device 125.

FIG. 4 schematically illustrates a logical flow diagram for a lookuproutine 400 to lookup data items in the distributed caching system 100of FIG. 1. In some implementations, the routine is performed byembodiments of the distributed caching system 100 described withreference to FIG. 1 or by another component of the system 100, such asthe front-end system 102. For ease of explanation, the followingdescribes the routine as performed by the front-end system. The routineis discussed in the context of an example scenario that is intended toillustrate, but not to limit, various aspects of the distributed cachingsystem 100.

Beginning at block 405, the front-end system 102 receives a firstrequest for a first data item. For example, the first request may comefrom a client device 125 or a server thread 210 handling a request fromthe first client device, with the request received on an inter-processcommunication interface (e.g., a syscall, API, etc.). In some cases, therequest is received on a networking interface of the front-end system102 as part of network packets sent over the network 120 a by the clientdevice 125. A thread can then process the request and provide therequest to the inter-process communication interface. One exampletransaction is a web page or web content request, where a browser on theclient device 125 is requesting a web page and the distributed cachingsystem 100 provides caching for a web server. The web server or a threadassociated with the web server, after receiving the request, can requestthat the front-end system 102 provide the first data item. Other typesof transactions may also be handled by the distributed caching system100. As discussed above, caching data items can speed up a wide varietyof data transactions.

At block 410, the front-end system 102 determines whether the first dataitem is stored in local in-memory cache 220 (sometimes referred toherein as an internal cache or a local cache). In one embodiment, thefront-end system identifies a first key corresponding to the first dataitem. For example, the distributed caching system 100 can obtain orderive some type of identifier (e.g., session ID, user ID, URL,filename, etc.) from the first request. The front-end system 102 canthen determine whether the key and a value associated with the key arein the cache. For example, the cache can include a hash table or similarstructure and the system 102 checks if the hash table includes an entryfor the key by applying a hashing function to the key.

At block 415, based on the determination, the front-end system 102proceeds to block 420 or block 435. In some cases, the first data itemis in the local in-memory cache (e.g., a cache hit) and the systemproceeds to block 435. In some cases, the first data item is not in thelocal in-memory cache (e.g., a cache miss) and the system proceeds toblock 420.

At block 420, the front-end system 102 identifies, from a plurality ofexternal caches, an external cache designated to store the first dataitem. In one embodiment, the system 102 applies a hashing function(e.g., consistent hashing) or other deterministic function to the dataitem request or an identifier derived or generated from the data itemrequest to identify the external cache. In one embodiment, the same dataitem request maps to the same external cache, thus, if the first dataitem has previously been requested, the external cache may still havethe first data item cached.

At block 422, the front-end system 102 determines whether the externalcache is storing the first data item. In some cases, the first data itemmay not be on the external cache. For example, the first data item maynot have been requested before, thus the first data item has not yetbeen loaded onto the external cache from permanent storage. If the dataitem is not on the external cache, the routine 400 proceeds to block440. If the data item is on the external cache, the routing proceeds toblock 425.

At block 425, the front-end system obtains the first data item from theexternal cache, assuming the external cache has the data item. In someembodiments, if the external cache does not have the first data itemstored, the backup cache system on which the external cache resides orthe front-end system 102 can obtain the first data item from a primarydata storage. In some cases, the primary data storage may be a massstorage device on the same computing system or computer as the front-endsystem 102. In some cases, the primary data storage may be a storagedevice located on the network 120 b. For example, the primary datastorage may be part of an external database or server. In otherembodiments, the front-end system provides a notice of a cache missrather than obtaining the first data item from the primary data storage.

At block 430, the front-end system 102 stores the first data item in itsin-memory cache. In some cases, the cache may be full and the front-endsystem 102 identifies storage blocks in the cache that can be replacedwith the first data item. For example, the system 102 may identify theleast recently used blocks and then select those blocks for replacement.However, as discussed above, many different cache algorithms can be usedto identify which storage blocks on the cache to replace. The routine400 can then proceed back to block 430 and perform the functionsdescribed above for block 430.

At block 440, the front-end system 102 provides the results (e.g., firstdata item or a cache miss result). For example, the front-end system 102can provide the first data item to the source of the request (e.g., aclient device, a web server, a requesting thread, another service onsame computer or the like) if the first data item was found on theinternal cache or the external cache. If the first data item was notfound, either on the internal cache or the external cache, the front-endsystem can inform the requestor (e.g., a requesting thread) that thecache request resulted in a cache miss. The requesting thread can thentry to obtain the data from the primary data storage.

At block 445, the front-end system 102 optionally provides the firstdata item to other requestors. In many situations, there can be multiplerequestors waiting for the same data item. For example, if the firstdata item is a news article hosted on a web site, many users of the website may have requested the news article, particularly if the article ispopular. In such situations, the front-end system can, in response toreceiving the first data item, determine if other requestors are waitingfor the first data item. For example, the front-end system can obtain orgenerate a key (e.g., using a hash function) associated with the firstrequest. The system 102 can then determine if additional outstandingrequests are associated with the same key, for example, by generatingkeys for those requests and checking if those keys match the key for thefirst request. The front-end system can then provide the first data itemto all or some of the outstanding requests. By checking if the firstdata item is responsive to other requests, the front-end system 102 canreduce the latency of responses to those other requests because theother requestors can piggyback off the response to the first requestor.The routine 400 can then end.

For illustrative purposes, an example scenario describes one embodimentof a response duplication process in additional detail. In the scenario,the front-end system 102 obtains a first data item from a first cache(e.g., internal or external cache), the first data item responsive to afirst request from a first requestor (e.g., a client device or athread). Meanwhile, a second request for the first data item is receivedfrom a second requestor (e.g., a client device or a thread) before thefirst data item is retrieved. When the first data item is received, thefront-end system can determine that the second request is pending andcan subsequently send the first data item to the first requestor and thesecond requestor.

In some embodiments, the front-end system, in response to receiving thesecond request, sends out a second cache request for the first data itemif the first cache request responsive to the first request is stilloutstanding. That is, there may be two (or more if additional requestsare received) outstanding cache requests. In such cases, the front-endsystem can process whichever cache response (for the first cache requestor second cache request) is received back first and provide the firstdata item to the first and second requestors. When the trailing cacheresponse arrives, the front-end system can discard that cache responsesince the first data item has already been provided.

Referring back to block 415, if the in-memory cache includes the firstdata item, the routine 400 proceeds to block 435. At block 435, thefront-end system 102 obtains the first data item from the cache. Theroutine 400 can then proceed to block 440 and block 445, perform theoperations described above, and then end.

Session State Caching

FIG. 5 illustrates a distributed services system 100 that uses anembodiment of the distributed cache system 100 of FIG. 1 to speed uptransport layer security (TLS) transactions or other types ofauthentication transactions by caching session state identifiers. In oneembodiment, the distributed services system 500 includes one or morerequest distributor systems or modules 505, a proxy fleet 502 having oneor more proxy systems 510 and a distributed caching system 100 havingone or more cache systems 150 a-c. The various components (e.g., therequest distributor module, proxy servers, cache systems, etc.) caninclude physical or virtual computing systems having or associated withprocessor(s), computer memory, network interface(s) and/or mass storagedevices. In some embodiments, the distributed services system 500operate on one or more data centers in one or more geographic regions inor more different time zones. For example, the distributed servicessystem 500 may be operated by a computing services provider thatprovides computing resources to customers, such as on an as-needed oron-demand basis.

In the illustrated embodiment, the request distributor 505 is configuredto load balance or otherwise distribute workload to the proxy fleet 502.In one embodiment, load balancing is a computer networking methodologyto distribute workload across multiple computers or a computer cluster,network links, central processing units, disk drives, or otherresources, to achieve efficient resource utilization, increasethroughput, reduce response time and avoid overload. In addition, usingmultiple components with load balancing, instead of a single component,may increase reliability through redundancy. The request distributor 505can include dedicated software or hardware, such as a computing system,multilayer switch or a Domain Name System server.

In one embodiment, the proxy systems 510 are similar to or include atleast some of the functionality of the front-end systems of FIG. 1. Inthe illustrated embodiment, the cache systems include a local cache 145and a services module 512 for providing user services, such asprocessing Secure Sockets Layer (SSL) or TSL connection requests. Forexample, the proxy system 510 may include an application or virtualmachine instance configured to provide some service, such as storage,computation and/or web hosting. As discussed above, the proxy fleet 502can include physical computing systems and/or virtual computing systemsoperating on physical computing systems. For example, the proxy fleet502, in one embodiment, can be implemented as a group of virtual machineinstances operating on computers in a data center.

Transport Layer Security (TLS) is a cryptographic protocol that providessecure communications over the Internet. TLS encrypts the segments ofnetwork connections at the Application Layer for the Transport Layer,using asymmetric cryptography for key exchange, symmetric encryption forconfidentiality, and message authentication codes for message integrity.

TLS is the successor to SSL, another cryptographic protocol. Severalversions of the TLS protocols are in use in applications such as webbrowsing, electronic mail, Internet faxing, instant messaging and/orvoice-over-IP (VoIP). Transport Layer Security, formerly SSL, is anInternet Engineering Task Force (IETF) standards protocol, firstintroduced as IETF Request for Comments (RFC) 2246 in January, 1999.Since that time, there have been numerous informational and standardstrack RFC's introducing new versions, options and extensions, including,but not limited to, RFC 2818, 3436, 3546, 4346, 4347, 5077, 5216, 5246,5487, 6066, 6091. A specific embodiment herein is related to TLS sessionre-use, as introduced in RFC 5077.

The following describes the operation of an embodiment of the TLSprotocol in which a client is authenticated via TLS using certificatesexchanged between both server and the client during a handshake process.In TLS, a server proves its identity to the client. The client may alsoneed to prove its identity to the server. Public-key infrastructure(PKI), the use of public/private key pairs, is the basis of thisauthentication. The exact method used for authentication is determinedby the cipher suite negotiated.

In one embodiment of TLS, the client and server exchange random numbersand a special number called the Pre-Master Secret. These numbers arecombined with additional data permitting client and server to createtheir shared secret, called the Master Secret. The Master Secret is usedby client and server to generate the write MAC secret, which is thesession key used for hashing, and the write key, which is the sessionkey used for encryption. The client and server can then exchangeapplication data over the secured channel they have established. Allmessages sent from client to server and from server to client areencrypted using session key.

Authentication can be computationally complex. For example,authentication can include public key operations (e. g., RSA) that arerelatively expensive in terms of computational power. The TLS protocolprovides a secure shortcut in the handshake mechanism to avoid at leastsome operations. In an ordinary full handshake, a session ID (or othersession state identifier) is generated for the session and stored by theserver. In one embodiment, the client associates this session ID withthe server's IP address and TCP port, so that when the client connectsagain to that server, it can use the session ID to enable a shorterhandshake process. In the server, the session ID maps to thecryptographic parameters previously negotiated and/or other session datafrom the previous session, such as the “master secret”. In oneembodiment, both sides must have the same “master secret,” otherwise theresumed handshake will fail (beneficially, this can prevent aneavesdropper from using a session ID). This type of resumed handshake issometimes called an abbreviated handshake or a restart handshake.

Referring back to the distributed caching system 100, in one embodiment,the distributed caching system 100 acts as a shared caching system forthe proxy fleet 502, enabling TLS sessions or other secured sessions tobe resumed or session data to be otherwise reused. By utilized a sharedcaching system, client devices can be transparently transferred to otherproxy systems, allowing different proxy systems to serve clientrequests, even if a particular client device authenticated with a firstproxy system but then is subsequently assigned to a second, differentproxy system (e.g., during a second session). By allowing differentproxy systems to handle subsequent requests from a particular clientdevice, the distributed services system 500 can better manage resources,for example, by allowing subsequent requests to be assigned to less busyproxy systems 510.

For example, after the user is authenticated with the first proxysystem, the first proxy system can save the session ID into a sharedcache system 150 a-c of the distributed caching system 100 so that, whenthe second proxy system handles another transaction from the client itcan obtain the session ID from a shared cache in the distributed cachingsystem 100. Generally, TLS authentication is computationally complex,such that reading the session ID from a cache instead ofre-authenticating can improve system performance and reduce latency.

FIG. 6 illustrates an embodiment of the data flow in the distributedcaching system 500 of FIG. 5 during an example embodiment of a TLSauthentication transaction.

Starting at event 1, the client device 125 begins a first TLStransaction and sends a request to the request distributor 505. At event2, the request distributor 505 determines which proxy system to assignthe request to for processing. For this example, the request distributorassigns the request to a first proxy system 510 a. As the client device125 is not resuming an existing TLS session in this example, it goesthrough TLS authentication with the first proxy system 510 a. During theauthentication, a session ID is generated. At event 3, the first proxysystem 510 a identifies a designated cache system from the distributedcaching system 100 for storing the session ID. As discussed above inFIG. 1, the designated cache system can be selected based on a hashingfunction (e.g., consistent hashing). For purposes of this example, thedesignated cache system is cache system 150 c. However, in some cases,the designated cache system could be any cache system of the distributedcache system 100. At event 4, after the authentication is completed, thefirst proxy system 510 a can provide the requested service and completethe first transaction. In some embodiments (4 a), the first proxy system510 a responds directly to the client device 125, while in someembodiments (4 b), the first proxy system 510 a provides a response tothe request to the request distributor 505, which then provides theresponse to the client device 125.

At event 5, the client device 125 begins a new, second transaction. Sometime may have passed since the first transaction, and the requestdistributor 505, at event 6, reassigns the second transaction to thesecond cache system 510 b. The second cache system 510 b can attempt tospeed up the authentication process by checking whether a session ID isstored for the client device 125, indicating the device 125 was recentlyauthenticated. At event 7, the second proxy system 510 b determines ifthe session ID is cached in the distributed caching system 500. In oneembodiment, the second proxy system 510 b checks its local cache for thesession ID, and, if not found there, it identifies the designated cache150 c and requests the session ID from the external, designated cachesystem 150 c. In one embodiment, the second proxy system 510 b checksthe local and external caches by performing a routine that is similar tothe routine 400 described in FIG. 4. At event 8, the second proxy system510 b receives the session ID from the cache system 150 c, assuming thesession ID is still stored by the system 150 c. The second proxy system510 b can then complete authentication of the client device 125. Atevent 9, the cache system 510 b performs the requested function andcompletes the second transaction. In some implementations (9 a), thesecond proxy system 510 b responds directly to the client device 125,while in other implementations (9 b), the second proxy system 510 bprovides a response to the request to the request distributor 505, whichthen provides the response to the client device 125.

FIG. 7 schematically illustrates a logical flow diagram for a lookuproutine 700 to lookup data items in the distributed services system 500of FIG. 5. In some implementations, the routine is performed bycomponents of the distributed caching system 500 described withreference to FIG. 5, such as a cache system 150 or proxy system 510. Forease of explanation, the following describes the routine as performed bya proxy system 510. The routine is discussed in the context of anexample scenario that is intended to illustrate, but not to limit,various aspects of the distributed services system 500. In one examplescenario, the routine 700 is performed by a proxy system 510 attemptingto reuse existing negotiation data from a previous connectionestablishment transaction.

Beginning at block 705, the proxy system 510 receives a TLS connectionrequest from a first client device. For example, the first request maycome from a client device 125, with the request transmitted over thenetwork 120 a, received and routed by a request distributor 505, andsent to a networking interface or another interface (e.g., anapplication programming interface or application) of the proxy system510. Assuming the client device 125 was previously authenticated, asession state identifier (e.g., a session ID or other identifierassociated with session state data) may still be stored on thedistributed caching system 100. The proxy system 510 can search for thecached session ID rather than performing a re-authentication. By doingso, the system 510 can improve performance and reduce latency.

At block 710, the proxy system 510 determines whether a session stateidentifier and/or session data is stored on the system's local cache145. For example, the distributed caching system 500 can obtain orderive some type of key or identifier (e.g., user ID, URL, filename,etc.) from the first request. It can then determine whether the key anda value associated with the key are in the cache. For example, the cachecan include a hash table or similar structure and the proxy system 510checks if the hash table includes an entry for the key by applying ahashing function to the key.

At decision block 715, based on the determination, the proxy system 510proceeds to block 720 or block 735. In some cases, the session stateidentifier is in the cache (e.g., a cache hit) and the system proceedsto block 735. In some cases, the session state identifier is not in thecache (e.g., a cache miss) and the system proceeds to block 720.

At block 720, the proxy system 510 identifies, from a plurality ofexternal caches, an external cache (e.g., a cache system 150 of thedistributed caching system 100) designated to store the first data item.In one embodiment, the cache system 510 applies a hashing function(e.g., consistent hashing) to the data item request or an identifierderived or generated from the data item request to identify the externalcache. In one embodiment, the same data item request maps to the sameexternal cache, thus, if the first data item has previously beenrequested, the external cache may still have the first data item cached.

At decision block 722, the proxy system 510 determines whether thesession state identifier is stored on the external cache. In some cases,the session state identifier is in the external cache (e.g., a cachehit) and the system proceeds to block 725. In some cases, the sessionstate identifier is not in the cache (e.g., a cache miss) and the systemproceeds to block 745.

At block 725, the proxy system 510 obtains the session state identifierfrom the external cache. In some cases, the session state identifier isassociated with additional session data and, in some embodiments, theproxy system 510 obtains the session data with the session stateidentifier from the external cache. In some cases, the session stateidentifier is sufficient to allow the proxy system 510 to authenticatethe client device without additional session data.

At block 730, the proxy system 510 stores the session state identifierin the system's local cache. In some embodiments, the proxy system 510stores additional session data on the local cache.

At block 740, the proxy system 510 authenticates the TLS connection. Theproxy system 510 may then securely communicate with the client device125 and can receive and fulfill requests from the client device. Theroutine 700 can then end.

Referring back to block 715, if the local cache includes the sessionstate identifier, the routine 700 proceeds to block 735. At block 735,the proxy system 510 obtains the session state identifier from the localcache. The proxy system may also obtain additional session dataassociated with the session state identifier. The routine 700 can thenproceed to block 740, perform the operations described above, and thenend.

Referring back to block 722, if the external cache does not store thesession state identifier, the routine 700 proceeds to block 745. Atblock 745, the proxy system 510 performs a full handshake or otherwisere-authenticates the client device. For example, the proxy system 510can treat the request as a new request and go through the complete TLSconnection setup process. The routine 700 can then end.

Multiple Breakpoints for Security Handshakes

FIG. 8 schematically illustrates an embodiment of a data flow in adistributed caching system configured to enable resumption of securityhandshake transactions. In one embodiment negotiation data between aclient device and a server during different points in the SSL/TLShandshake negotiation (or other secure transportation protocol) arestored on the distributed caching system 100. By saving the negotiationdata, such as the TLS negotiation data described above, during differentpoints, the distributed caching system 100 can enable additionalbreakpoints in secure transportation protocol handshake transactionswhere the negotiations can be resumed. For example, in the case of apacket time out, the negotiation can be resumed with the same server oreven a different server when the client device attempts to reconnect.

This technique can be particularly beneficial in systems, such as thedisturbed services system 500 of FIG. 5, where a server is dynamicallyselected from a group of servers to respond to requests from clientdevices. In such a system, a client reconnecting to the distributedservices system 500 can be assigned to a different server than it wasoriginally connected to before the initial handshake transaction wasinterrupted. However, by saving the negotiation data to the distributedcaching system 100, a newly assigned second server can obtain the savednegotiation data and transparently resume the handshake operation, evenif the secure transportation protocol does not support a resumptionfeature.

In event 1, the client sends a client hello message to server 1. Inevent 2, server 1 then responds to the client hello message with aserver hello message. In event 3, server 1 also transmits the handshakedata received/and or sent by server 1 to the distributed caching system100.

In event 4, server 1 sends a certificate to the client. Suchcertificates can be used the calculation of encryption keys. At event 5,server 1 updates the handshake data saved in the distributed cachingsystem 100. At this point, an interruption event occurs. For example,packets sent between the client and server 1 may have been dropped orlost, causing a time out. Other possible interruptions include crashes,restarts or other failure events.

In event 6, the client sends a certificate or otherwise responds to theprevious message from server 1. For example, the client can send thenext message required by the protocol or the client can resend theprevious message if no response was received from server 1 due to theinterruption event.

In the illustrated scenario, the client is assigned to server 2subsequent to the interruption event. For example, server 1 may havecrashed and is still not available to resume handling transactions.Server 2 retrieves the handshake data stored by server 1 from thedistributed caching system. In event 6, the client and server 2 completethe handshake.

In another scenario (not shown), the client is again assigned to server1 after the interruption event. However, in the scenario, server 1 lostany local copies of the handshake data after the interruption event. Forexample, server 1 may have crashed or rebooted, losing non-permanentlystored data. In this scenario, server 1 can still retrieve the handshakedata from the distributed caching system and complete the handshaketransaction with the client.

Multiple Data Center Embodiments

FIG. 9 illustrates an embodiment of a data flow in a distributed cachingsystem having caching systems distributed across multiple data centersor other partition groups. In one embodiment, writes to a cache by acaching system at a first data center are also performed at one or moreother caching systems at one or more other data centers. To the extentthat data requested at one data center is also requested at other datacenters, distributing the cache writes can increase the cache hits atthe other data centers. While the following disclosure discusses anembodiment across multiple data centers, other partition groups can alsobe used. For example, a partition group can include geographicgroupings, security groupings, network partitions, virtual machineinstances operation on the same computer or multiple computers, one ormore cache servers, application clusters and/or other defined groupingsof computing resources.

One scenario where spreading cache writes can be beneficial is where aweb site is hosted by a distributed services system 500 operating onmultiple data centers. For example, the web site may be hosted on datacenters on the U.S. east coast and on the U.S. west coast, in order toreduce latency and better serve visitors of the web site across thecountry. As the east coast and west coast are different geographicregions on different time zones, visitors on the east coast will likelybegin visiting the web site earlier than visitors on the west coast.However, if cache updates caused by requests on the east coast datacenter (which is on an earlier time zone) are propagated to the westcoast data center, then, when visitors begin requesting data items fromcaching systems on the west coast data center, the caching systems willalready be pre-populated with useful data items, thereby increasing thechances of cache hits occurring in the caching systems in the west coastdata centers.

In one embodiment, while cache writes are propagated across multipledata centers, cache reads remain local to a particular data center.Requesting data from a different data center can be expensive, asnetwork packets may have to be transmitted over significant distancesover multiple networks, increasing the chance that packets can be lost.By keeping reads local to the data center, high responsiveness to clientrequests can be maintained. Cache writes, meanwhile, can be performedseparately from responding to client requests, thus propagating cachewrites does not necessarily affect responsiveness to client requests.

Starting at event 1, a first client device 125 a requests a first datafrom a first cache system in a first data center 905 a. The first cachesystem can then check caches (e.g., local caches or external caches)associated with the first cache system in the first data center.Assuming the first data item is not in the caches or if the first dataitem is not up to date, the first cache system can retrieve the firstdata item from primary storage.

At event 2, the first cache system can then update the one or morecaches (e.g., the local cache) in the first data center 905 a with thefirst data item. For example, the first cache system can load the firstdata item into the one or more caches or it can update a stale entry forthe first data item in the one or more caches.

At event 3, the first cache system can then provide the first data itemto the client device 125 a in response to the original request. At event3 and event 4, the first cache system can transmit the cache update forthe first data item to a second data center 905 b and a third datacenter 905 c, respectively, in order to cause caches at those datacenters to be updated. As discussed above, the cache writes can beperformed separately from responding to client requests, thus, in someembodiments, event 3 and/or 4 are performed before event 2 and viceversa. Referring back to event 4, a second cache system at the seconddata center 905 b receives the cache update and updates one or morecaches associated with the second cache system. Likewise, referring backto event 5, a third cache system at the third data center 905 c receivesthe cache update and updates one or more caches associated with thethird cache system.

At event 6, the second cache system receives a second request for thefirst data item from a second client device 125 b. As cache(s) in thedata center 905 b have been updated as a result of the cache updatepropagation, the second cache system can find the first data item in thecache(s) at the second data center 905 b. At event 7, the second cachesystem provides the first data item retrieved from the cache(s) to thesecond client device 125 b.

Additional Embodiments

While the above disclosure has discussed specific embodiments of thedistributed caching system, many variations of the distributed cachingsystem 100 are possible. For example, different data structures can beused instead of or in addition to the data structures described above.For example, rather than a hash table, the in-memory cache may use ab-tree, b+ tree, binary search tree, skip list or other data structure.In another example, data stored in the distributed caching system 100may be encrypted or unencrypted.

As described above, the distributed caching system 100 can beimplemented with one or more physical servers or other computingmachines, such as several computing machines interconnected via anetwork. Thus, each of the components depicted in the distributedcaching system 100 can include hardware and/or software for performingvarious features. In one embodiment, the distributed caching system 100is implemented on a computing system that hosts a web site or collectionof web sites that the system 100 monitors.

The distributed caching system 100 can include one or more servers forreceiving and responding to requests from a network, such as requests toprocess session records. The one or more servers can include webservers, application servers, database servers, combinations of thesame, or the like.

The processing of the various components of the distributed cachingsystem 100 can be distributed across multiple machines, networks andother computing resources. The various components of the distributedcaching system 100 can also be implemented in one or more virtualmachines, rather than in dedicated servers. Likewise, data repositoriescan include represent physical and/or logical data storage, including,for example, storage area networks or other distributed storage systems.Moreover, in some embodiments the connections between the componentsshown represent possible paths of data flow, rather than actualconnections between hardware. While some examples of possibleconnections are shown, any of the subset of the components shown cancommunicate with any other subset of components in variousimplementations.

In some embodiments, the distributed caching system 100 may beconfigured differently than illustrated in the figures above. Forexample, various functionalities provided by the illustrated modules canbe combined, rearranged, added, or deleted. In some embodiments,additional or different processors or modules may perform some or all ofthe functionalities described with reference to the example embodimentillustrated in the figures above. Many implementation variations arepossible.

Other types of interactions (additionally or alternatively) between thedistributed caching system 100 and the users and/or user systems arepossible in addition to those described above. For example, adistributed caching system 100 interaction can be received directly froma user or administrator (e.g., via an interactive console, web browseror other GUI provided by the distributed caching system 100) or from anexecuting program. In some embodiments, users may interact with thedistributed caching system 100 using other types of interfaces and inother ways.

In some embodiments, the distributed caching system 100 and itscomponents are executed or embodied by one or more physical or virtualcomputing systems. For example, in some embodiments, a server computingsystem that has components including a central processing unit (CPU),input/output (I/O) components, storage and memory may be used to executesome or all of the components of the distributed caching system 100. TheI/O components can include a display, a network connection to thenetwork 105, a computer-readable media drive and other I/O devices(e.g., a keyboard, a mouse, speakers, etc.). In some embodiments, thedistributed caching system 100 may be configured differently thandescribed above.

An embodiment of the distributed caching system 100 can be stored as oneor more executable program modules in the memory of the server and/or onother types of non-transitory computer-readable storage media, and thedistributed caching system 100 can interact with computing assets overthe network 105. In one embodiment, the distributed caching system 100may have additional components or fewer components than described above.For example, the distributed caching system 100 may be built on top ofexisting software development or testing systems and designed tocoordinate the actions of the existing systems.

First Set of Example Embodiments

In an example embodiment, a system for distributed data cachingcomprises: (1) an inter-process communication interface for receivingrequests, the requests including a first request for a first data itemfrom a first client; (2) computer memory for storing a local cache; andone or more processors configured to: determine whether the first dataitem is stored in the local cache; in response to determining that thefirst data item is not on the local cache, determine a first externalcache from a plurality of external caches, the first external cachedesignated to store the first data item; transmit the first request forthe first data item to the external cache, the first request sent with asecond request for a second data item, the second request determined tobe designated for the external cache, the first request and the secondrequest transmitted over a single negotiated streaming protocolconnection socket; receive the first data item from the first externalcache; and provide the first data item to the first client.

In another example, the above system, wherein the one or more processorsare configured to detect inactive connections and close the inactiveconnections.

In an additional example, the above system, wherein the singlenegotiated streaming protocol connection socket is a transmissioncontrol protocol (TCP) socket or an inter-process communication (IPC)socket.

In another example, the above system, wherein the one or more processorsare configured to: associate multiple threads that handle requests fromclients to a plurality of network connections to the clients; anddynamically assign a request received on one of the plurality of networkconnections to one of the associated multiple threads.

In an additional example, the above system, further comprising adistributed caching client configured to: obtain the first data itemfrom the external cache; and transmit the first data item directly on anetwork connection to the first client without waking a thread assignedto the first client.

In another example, the above system, wherein the one or more processorsare configured to determine a first external cache from a plurality ofexternal caches by obtaining an identifier based at least partly on thefirst request and applying a deterministic algorithm to the identifiersuch that the deterministic algorithm identifies the first externalcache.

In an additional example, the above system, wherein the first data itemis received prior to the second data item.

In another example, the above system, wherein the first data item isreceived after the second data item.

In an example embodiment, a method of caching data comprising:receiving, on a first caching system, a first request for a first dataitem from a requestor; identifying a first external cache designated tostore the first data item from a plurality of external caches; transmitthe first request for the first data item with a second request for asecond data item, the second request determined to be designated for theexternal cache, the first request and the second request aggregated overa single protocol connection to the external cache; receiving the firstdata item from the first external cache; and providing the first dataitem to the requestor.

In another example, the above method, wherein the single protocolconnection comprises a single negotiated streaming protocol connection.Further, in one example, the above method wherein the negotiatedstreaming protocol connection is a Transmission Control Protocol (TCP)connection. Additionally, in another example, the above method, whereinthe negotiated streaming protocol connection uses a UNIX domain socket.

In an additional example, the above method, further comprising detectinginactive connections to one or more external caches and attempting toconnect to one or more external caches using a background thread ofexecution.

In another example, the above method, further comprising: associatingmultiple threads that handle requests from requestors to a plurality ofnetwork connections to the requestors; and dynamically assigning arequest received on one of the plurality of network connections to oneof the associated multiple threads.

In an additional example, the above method, wherein identifying thefirst external cache from a plurality of external caches comprises:obtaining an identifier based at least partly on the first request; andapplying a deterministic function to the identifier, wherein thedeterministic function identifies the first external cache.

In another example, the above method, wherein the requestor is a servicemodule configured to handle requests received from a plurality of clientdevices.

In an additional example, the above method, wherein the requestor is afirst client device.

In another example, the above method, wherein the requestor is a thread.

In an additional example, the above method, wherein the local cachestores data using fixed size storage blocks.

In another example, the above method, wherein the local cache storesdata using a slab allocation technique or a best fit allocationtechnique.

In an additional example, the above method, wherein identifying thefirst external cache designated to store the first data item from aplurality of external caches further comprises: determining whether thefirst data item is stored in a local cache of the first caching system;and in response to determining that the first data item is not on thelocal cache, identifying the first external cache from the plurality ofexternal caches. Further, in one example, the above method, furthercomprising storing the first data item on the local cache.

In an example embodiment, non-transitory computer storage having storedthereon instructions that, when executed by a computer system, cause thecomputer system to perform operations comprising: identifying a firstexternal cache designated to store a first data item from a plurality ofexternal caches, the first data item requested in a first request;identifying a second request for a second data item destined for thefirst external cache, the first external cache designated to store thesecond data item; transmitting the first request and the second requesttogether over a single connection socket to the external cache; andreceiving the first data item or the second data item from the firstexternal cache.

In another example, the above non-transitory computer storage, whereinthe first request and the second request are multiplexed over a singletransmission control protocol (TCP) socket or an inter-processcommunication (IPC) socket.

In an additional example, the above non-transitory computer storage,wherein the first request and the second request are multiplexed over afirst connection protocol for local communications in the computersystem and a second connection protocol for network communications.

In another example, the above non-transitory computer storage, whereinthe first request is received from a first client device from aplurality of computing devices. Further, in one example, the abovenon-transitory computer storage, wherein the instructions further causethe computer system to perform operations comprising: detecting inactiveconnections to one or more external caches; and attempting to connect tothe one or more external caches using a background thread of execution.In addition, in one example, the above non-transitory computer storage,wherein the instructions further cause the computer system to performoperations comprising: associating multiple threads that handle requestsfrom the client devices to a plurality of network connections to theclient devices; and assigning a request received on one of the pluralityof network connections to one of the associated multiple threads.

In an additional example, the above non-transitory computer storage,wherein identifying the first external cache comprises: obtaining anidentifier based at least partly on the first request; and applying adeterministic function to the identifier, wherein the deterministicfunction identifies the first external cache.

In another example, the above non-transitory computer storage, whereineither the first data item or the second data item is received by thecomputing system first.

Second Set of Example Embodiments

In an example embodiment, a distributed system for reusing securesessions using non-locally cached data comprises: (1) a plurality ofproxy systems configured to provide services to users, each proxy systemhaving a local cache for storing data; (2) a request distribution moduleconfigured to: receive requests for secure communications over anetwork, the requests sent by a plurality of client devices associatedwith the users, the requests including a first request from a firstclient device; and assign the first request for secure communications toa first proxy system from the plurality of proxy systems; and (3) thefirst proxy system comprising a local cache, the first proxy systemconfigured to: determine whether the local cache includes a sessionstate identifier associated with the first request; in response todetermining that the local cache does not include the session stateidentifier, identify a first cache system from a plurality of sharedcache systems, the first cache system designated to store data itemsassociated with the first request; determine whether the first cachesystem includes the session state identifier associated with the firstrequest; in response to determining that the second computing deviceincludes the session state identifier, obtain the session stateidentifier from the first cache system, wherein the session stateidentifier was stored on the first cache system as a result of a priortransaction with the first client device; and establish a secureconnection with the first client device based at least in part on thesession state identifier.

In another example, the above distributed system, wherein the secureconnection uses Transport Layer Security (TLS) protocol.

In an additional example, the above distributed system, wherein thesession state identifier was stored on the first cache system as aresult of the prior transaction with the first client device, the priortransaction occurring between the first client device and a second proxysystem different from the first proxy system.

In another example, the above distributed system, wherein the sessionstate identifier was stored on the first cache system as a result of theprior transaction with the first client device, the prior transactionoccurring between the first client device and the first proxy system.

In an additional example, the above distributed system, wherein thefirst computing device is configured to establish a secure connectionwith the first client device based at least in part on the session stateidentifier by obtaining cryptographic parameters associated with thesession state identifier and performing an abbreviated hand shakeoperation using at least the cryptographic parameters.

In another example, the above distributed system, wherein session stateidentifier comprises a session ID.

In an example embodiment, a method comprises: receiving, on a firstcomputing system, a request for secure communications, the request sentby a first client device, wherein a previous request from the firstclient device was sent during a first session; identifying a secondcomputing system having a cache, the second computing system designatedto store data associated with the request in the cache, the secondcomputing system accessible to the first computing system over anetwork; obtaining a session state identifier from the second computingsystem; and establishing, from the first computing system, a secureconnection with the first client device based at least in part on thesession state identifier.

In an additional example, the above method, wherein obtaining thesession state identifier from the second computing system comprises:determining whether the cache of the second computing system includesthe session state identifier associated with the request, the sessionstate identifier generated during the first session; and in response todetermining that the cache of the second computing system includes thesession state identifier, obtaining the session state identifier fromthe second computing system.

In another example, the above method, wherein the secure connection usesTransport Layer Security (TLS) protocol.

In an additional example, the above method, wherein establishing thesecure connection with the first client device comprises resuming thefirst session.

In another example, the above method, wherein establishing the secureconnection with the first client device based at least in part on thesession state identifier comprises obtaining session state dataassociated with the session state identifier and performing anabbreviated hand shake operation using at least the session state data.

In an additional example, the above method, further comprisingdetermining that the session state identifier is not stored on a localcache of the first computing system.

In another example, the above method, wherein the first session wasbetween the first client device and a third computing system differentfrom the first computing system and the second computing system, thethird computing system configured to transmit the session stateidentifier to the second computing system for storage on the cache ofthe second computing system.

In an additional example, the above method, wherein the first sessionwas between the first client device and the first computing system, thefirst computing system configured to store the session state identifiergenerated during the first session on the cache of the second computingsystem.

In another example, the above method, wherein the session stateidentifier comprises a session ID.

In an additional example, the above method, wherein identifying thesecond computing system designated to store data associated with therequest comprises applying a consistent hashing function to anidentifier related to the request. Further, in the above method, theidentifier includes an IP address.

In an additional example, the above method, wherein the second computingsystem is part of a shared caching system.

In another example, the above method, wherein the session stateidentifier is reused from a prior secure connection.

In an additional example, the above method, wherein the secureconnection is resumed from a prior network transaction.

In an example embodiment, a non-transitory computer storage havingstored thereon instructions that, when executed by a computer system,cause the computer system to perform operations comprising: receiving,on a first computing system, a request for secure communications, therequest sent by a first client device, wherein a previous request fromthe first client device was sent during a first session; determiningwhether an external cache includes a session state identifier associatedwith the request, the session state identifier generated during thefirst session; in response to determining that the external cacheincludes the session state identifier, obtaining the session stateidentifier from the external cache; and establishing, from the firstcomputing system, a secure connection with the first client device basedat least in part on the session state identifier.

In another example, the above non-transitory computer storage, whereinestablishing a secure connection with the first client device based atleast in part on the session state identifier comprises reusingnegotiation data from the first session.

In an additional example, the above non-transitory computer storage,wherein the first session was between the first client device and asecond computing system different from the first computing system, thesecond computing system configured to store the session state identifierin the external cache.

In another example, the above non-transitory computer storage, whereinthe external cache is a first external cache of a shared cache systemcomprising a plurality of external caches.

In an additional example, the above non-transitory computer storage,further comprising identifying the first external cache designated tostore the session state identifier from the plurality of external cachesof the shared cache system.

In an example embodiment, a distributed system comprises: a first proxysystem comprising one or processors and a local cache, the first proxysystem configured to: in response to determining that the local cachedoes not include a session state identifier associated with a requestfor a secure connection from a client device, identify a first cachesystem from the plurality of shared cache systems, the first cachesystem designated to store data items associated with the first request;in response to determining that the first cache system includes thesession state identifier, obtain the session state identifier from thefirst cache system, wherein the session state identifier was stored onthe first cache system as a result of a prior transaction with theclient device; and establish a secure connection with the client devicebased at least in part on the session state identifier.

Third Set of Example Embodiments

In an example embodiment, a system for distributed data cachingcomprises: (1) a first caching system in a first partition group, thefirst caching system configured to: in response to receiving a firstcache update for a first data item, update a first cache in the firstpartition group with the first cache update; and transmit the firstcache update for the first data item to a second caching system in asecond partition group; and (2) the second caching system in the secondpartition group, the second caching system configured to: in response toreceiving the first cache update transmitted by the first cachingsystem, update a second cache in the second partition group with thefirst cache update for the first data item; in response to receiving arequest for the first data item from a requestor, obtain the first dataitem from the second cache; and provide the first data item to therequestor.

In another example, the above system, wherein the first partition groupand the second partition group are located in different time zones.

In an additional example, the above system, wherein the first partitiongroup and the second partition group are located in different geographicareas.

In another example, the above system, wherein the first cache system andthe second cache system are part of a distributed caching system thatspans multiple partition groups.

In an example embodiment, a method of caching data in a disturbedsystem, the method comprising: receiving, at a first caching system at afirst partition group, an update for a first data item; updating a firstcache associated with the first caching system with the update for thefirst data item, the first cache in the first partition group; andtransmitting the cache update for the first data item to a secondcaching system in a second partition group, thereby causing the secondcaching system to update a second cache on the second partition groupwith the cache update for the first data item.

In another example, the above method, further comprising: in response toreceiving a request for the first data item at the first caching system,retrieving the first data item from the first cache in the firstpartition group.

In an additional example, the above method, further comprising: inresponse to receiving a request for the first data item at the secondcaching system, retrieving the first data item from the second cache inthe second partition group.

In another example, the above method, wherein the first partition groupand the second partition group are located in different time zones.

In an additional example, the above method, wherein the first partitiongroup and the second partition group are located in different geographicareas.

In another example, the above method, wherein the first cache system andthe second cache system are part of a distributed caching system thatspans multiple partition groups.

In an additional example, the above method, wherein receiving the updatefor the first data item comprises: receiving a request for the firstdata item; and in response to determining that the first caching systemdoes not store the first data item, retrieving the first data item fromprimary storage; wherein the first cache update for the first data itemcomprises adding the first data item.

In another example, the above method, wherein updating the first cacheassociated with the first caching system comprises loading the firstdata item into the first cache.

In an additional example, the above method, wherein updating the firstcache associated with the first caching system comprises: finding anentry in the first cache associated with the first data item; andupdating the entry in the first cache.

In another example, the above method, wherein the first partition groupis a data center and the second partition group is a data center.

In an additional example, the above method, wherein the first partitiongroup comprises multiple virtual machine instances on the same computer.

In an example embodiment, non-transitory computer storage having storedthereon instructions that, when executed by a computer system, cause thecomputer system to perform operations comprising: receiving, at a firstcaching system at a first partition group, an update for a first dataitem; updating a first cache associated with the first caching systemwith the update for the first data item, the first cache in the firstpartition group; and causing one or more additional caching systemslocated in one or more additional partition groups to update cacheslocated at the one or more additional partition groups.

In another example, the above non-transitory computer storage whereinthe first partition group and the one or more additional partitiongroups are located in different time zones.

In an additional example, the above non-transitory computer storage,wherein the first partition group and the one or more additionalpartition groups are located in different geographic areas.

In another example, the above non-transitory computer storage, whereinthe first cache system and the one or more additional cache systems arepart of a distributed caching system that spans multiple partitiongroups.

In an additional example, the above non-transitory computer storage,wherein updating the first cache associated with the first cachingsystem comprises loading the first data item into the first cache.

In another example, the above non-transitory computer storage, whereinupdating the first cache associated with the first caching systemcomprises: finding an entry in the first cache associated with the firstdata item; and updating the entry in the first cache.

In an additional example, the above non-transitory computer storage,wherein the first partition group comprises a data center and the secondpartition group comprises a data center.

In another example, the above non-transitory computer storage, whereinthe first partition group comprises multiple virtual machine instanceson the same computer.

Fourth Set of Example Embodiments

In an example embodiment, a system for distributed data cachingcomprises: (1) an inter-process communication interface for receivingrequests, the requests including a first request for a first data itemfrom a first requestor; (2) computer memory for storing a local cache;and (3) one or more processors configured to: determine whether thefirst data item is stored in the local cache; in response to determiningthat the first data item is not on the local cache, determine a firstexternal cache from a plurality of external caches, the first externalcache designated to store the first data item; transmit the firstrequest for the first data item to the external cache; receive the firstdata item from the first external cache; determine whether a secondrequestor has requested the first data item; in response to determiningthat a second requestor has requested the first data item, provide thefirst data item to the first requestor and the second requestor; and inresponse to determining that a second requestor has not requested thefirst data item, provide the first data item to the first requestor.

In another example, the above system, wherein the first requestor is afirst client device and the second requestor is a second client device.

In an additional example, the above system, wherein the first requestoris a first thread and the second requestor is a second thread.

In another example, the above system, further comprising a distributedcaching client configured to: obtain the first data item from theexternal cache; and transmit the first data item directly on a networkconnection to the first client without waking a thread assigned to thefirst client.

In an additional example, the above system, wherein the one or moreprocessors are configured to determine a first external cache from aplurality of external caches by obtaining an identifier based at leastpartly on the first request and applying a deterministic algorithm tothe identifier such that the deterministic algorithm identifies thefirst external cache.

In another example, the above system, wherein the one or more processorsare further configured to: identify additional requestors of the firstdata item; and in response to receiving the first data item, providingthe first data item to the additional requestors.

In an example embodiment, a method of caching data, the methodcomprising: receiving, on a first caching system, a first request for afirst data item from a first requestor; identifying a first cachedesignated to store the first data item, the first cache selected from alocal cache and a plurality of external caches; obtain the first dataitem from the first cache; and in response to determining that a secondrequestor has requested the first data item, provide the first data itemto the first requestor and the second requestor.

In another example, the above method, further comprising: identifyingadditional requestors of the first data item; and in response toreceiving the first data item, providing the first data item to theadditional requestors.

In another example, the above method, wherein the first cache is thelocal cache.

In an additional example, the above method, wherein the first cache isone of the plurality of external caches.

In another example, the above method, wherein identifying the firstcache from the local cache and the plurality of external cachescomprises: determining whether the first data item is stored in thelocal cache; and in response to determining that the first data item isnot on the local cache, determining a first external cache from theplurality of external caches, the first external cache designated tostore the first data item; wherein the first cache is the first externalcache. Further, in one example, the above method, wherein determiningthe first external cache from the plurality of external cachescomprises: obtaining an identifier based at least partly on a keyassociated with the first request; and applying a deterministic functionto the identifier, wherein the deterministic function identifies thefirst external cache. Additionally, in one example, the above method,wherein the first request and additional requests identical to the firstrequest are associated with a same key. Further, in one example, theabove method, wherein the deterministic function is a consistent hashingfunction.

In an additional example, the above method, wherein the first requestoris a first client device and the second requestor is a second clientdevice.

In another example, the above method, wherein the first requestor is afirst thread and the second requestor is a second thread.

In an example embodiment, non-transitory computer storage having storedthereon instructions that, when executed by a computer system, cause thecomputer system to perform operations comprising: receiving a firstrequest for a first data item from a first requestor; obtaining thefirst data item from a first cache; and in response to determining thata second requestor has requested the first data item, provide the firstdata item to the first requestor and the second requestor.

In another example, the above non-transitory computer storage, whereinobtaining the first data item from the first cache comprises selectingthe first cache from a local cache and a plurality of external caches.

In another example, the above non-transitory computer storage, whereinselecting the first cache from the local cache and the plurality ofexternal caches comprises: determining whether the first data item isstored in the local cache; and in response to determining that the firstdata item is not on the local cache, determining a first external cachefrom the plurality of external caches, the first external cachedesignated to store the first data item; wherein the first externalcache is the first cache. Further, in one example, the abovenon-transitory computer storage, wherein determining the first externalcache from the plurality of external caches comprises: obtaining anidentifier based at least partly on a key associated with the firstrequest; and applying a deterministic function to the identifier,wherein the deterministic function identifies the first external cache.

In another example, the above non-transitory computer storage, furthercomprising: identifying additional requestors of the first data item;and in response to receiving the first data item, providing the firstdata item to the additional requestors.

Fifth Set of Example Embodiments

In an example embodiment, a distributed system for resuming securityhandshakes using non-locally cached data, the system comprising: (1) aplurality of server systems configured to provide services to users,each server system configured to store security handshake data on adistributed caching system; (2) the distributed caching systemcomprising one or more caching systems, the distributed caching systemconfigured to: receive handshake data for a first handshake transactionbetween a first client and a first server from the plurality of serversystems; store the handshake data in the one or more caching systems; inresponse to an interruption event that interrupts the first handshaketransaction and causes the first client to be assigned to a secondserver from the plurality of server systems, receive a request for thehandshake data from the second server; and provide the second serverwith the handshake data to enable the first handshake transaction to beresumed.

In another example, the above system, wherein the security handshake isfor Transport Layer Security (TLS) protocol.

In an additional example, the above system, wherein the securityhandshake is for Secure Sockets Layer (SSL) protocol.

In another example, the above system, wherein the handshake dataincludes cryptographic parameters.

In an additional example, the above system, wherein the handshake dataincludes data sent in an initial message from the first client and datasent in an initial message from the first server.

In an example embodiment, a method for resuming security handshakesusing non-locally cached data, the method comprising: receivinghandshake data for a first handshake transaction between a first clientand a first server from a plurality of server systems; storing thehandshake data in one or more caching systems; in response to aninterruption event that interrupts the first handshake transaction andcauses the first client to be assigned to a second server from theplurality of server systems, receiving a request for the handshake datafrom the second server; and providing the second server with thehandshake data to enable the first handshake transaction to be resumed.

In another example, the above method, wherein the security handshake isfor Transport Layer Security (TLS) protocol.

In an additional example, the above method, wherein the securityhandshake is for Secure Sockets Layer (SSL) protocol.

In another example, the above method, wherein the handshake dataincludes cryptographic parameters.

In an additional example, the above method, wherein the handshake dataincludes data sent in an initial message from the first client and datasent in an initial message from the first server.

In another example, the above method, wherein the handshake dataincludes one or more of a public key, an encryption key, cryptographicmaterial and a security certificate.

In an example embodiment, non-transitory computer storage having storedthereon instructions that, when executed by a computer system, cause thecomputer system to perform operations comprising: receiving handshakedata for a first handshake transaction between a first client and afirst server from a plurality of server systems; storing the handshakedata on a caching system accessible from each of the plurality of serversystems; receiving a request for the handshake data from a second serverof the plurality of the server systems; and providing the second serverwith the handshake data to enable the first handshake transaction to beresumed by the second server; wherein the first handshake transactionbetween the first client and the first server was interrupted.

In another example, the above non-transitory computer storage, whereinthe security handshake is for Transport Layer Security (TLS) protocol.

In an additional example, the above non-transitory computer storage,wherein the security handshake is for Secure Sockets Layer (SSL)protocol.

In another example, the above non-transitory computer storage, whereinthe handshake data includes cryptographic parameters.

In an additional example, the above non-transitory computer storage,wherein the handshake data includes data sent in an initial message fromthe first client and data sent in an initial message from the firstserver.

In another example, the above non-transitory computer storage, whereinthe handshake data includes one or more of a public key, an encryptionkey, cryptographic material and a security certificate.

In an example embodiment, a method comprising: receiving handshake datafor a first handshake transaction between a first client and a firstserver; storing the handshake data in a caching system; in response toan interruption event that interrupts the first handshake transactionand causes the first server to lose locally stored handshake data,receiving a request for the handshake data from the first server; andproviding the first server with the handshake data to enable the firsthandshake transaction to be resumed.

Each of the processes, methods and algorithms described in the precedingsections may be embodied in, and fully or partially automated by, codemodules executed by one or more computers, computer processors, ormachines configured to execute computer instructions. The code modulesmay be stored on any type of non-transitory computer-readable storagemedium or tangible computer storage device, such as hard drives, solidstate memory, optical disc and/or the like. The processes and algorithmsmay be implemented partially or wholly in application-specificcircuitry. The results of the disclosed processes and process steps maybe stored, persistently or otherwise, in any type of non-transitorycomputer storage such as, e.g., volatile or non-volatile storage.

The various features and processes described above may be usedindependently of one another, or may be combined in various ways. Allpossible combinations and subcombinations are intended to fall withinthe scope of this disclosure. In addition, certain method, event, stateor process blocks may be omitted in some implementations. The methodsand processes described herein are also not limited to any particularsequence, and the blocks or states relating thereto can be performed inother sequences that are appropriate. For example, described tasks orevents may be performed in an order other than that specificallydisclosed, or multiple may be combined in a single block or state. Theexample tasks or events may be performed in serial, in parallel, or insome other manner. Tasks or events may be added to or removed from thedisclosed example embodiments. The example systems and componentsdescribed herein may be configured differently than described. Forexample, elements may be added to, removed from, or rearranged comparedto the disclosed example embodiments.

Conditional language used herein, such as, among others, “can,” “could,”“might,” “may,” “e.g.,” and the like, unless specifically statedotherwise, or otherwise understood within the context as used, isgenerally intended to convey that certain embodiments include, whileother embodiments do not include, certain features, elements and/orsteps. Thus, such conditional language is not generally intended toimply that features, elements and/or steps are in any way required forone or more embodiments or that one or more embodiments necessarilyinclude logic for deciding, with or without author input or prompting,whether these features, elements and/or steps are included or are to beperformed in any particular embodiment. The terms “comprising,”“including,” “having,” and the like are synonymous and are usedinclusively, in an open-ended fashion, and do not exclude additionalelements, features, act, operations and so forth. Also, the term “or” isused in its inclusive sense (and not in its exclusive sense) so thatwhen used, for example, to connect a list of elements, the term “or”means one, some, or all of the elements in the list. Conjunctivelanguage such as the phrase “at least one of X, Y and Z,” unlessspecifically stated otherwise, is otherwise understood with the contextas used in general to convey that an item, term, etc. may be either X, Yor Z. Thus, such conjunctive language is not generally intended to implythat certain embodiments require at least one of X, at least one of Yand at least one of Z to each be present.

While certain example embodiments have been described, these embodimentshave been presented by way of example only, and are not intended tolimit the scope of the inventions disclosed herein. Thus, nothing in theforegoing description is intended to imply that any particular feature,characteristic, step, module, or block is necessary or indispensable.Indeed, the novel methods and systems described herein may be embodiedin a variety of other forms; furthermore, various omissions,substitutions and changes in the form of the methods and systemsdescribed herein may be made without departing from the spirit of theinventions disclosed herein.

What is claimed is:
 1. A system for distributed data caching, the systemcomprising: a first caching system in a first partition group, the firstcaching system configured to: receive a first request to retrieve a dataitem; in response to the first request to retrieve the data item anddetermining that the first caching system does not store the data item:retrieve the data item from a primary storage, the primary storageseparate from the first caching system; update a first cache in thefirst partition group by adding the data item; and transmit the dataitem that was determined to not be stored in the first caching system toa second caching system in a second partition group; and the secondcaching system in the second partition group, the second caching systemconfigured to: in response to receiving the data item transmitted by thefirst caching system, update a second cache in the second partitiongroup by adding the data item; in response to receiving a second requestto retrieve the data item from a requestor, obtain the data item fromthe second cache; and provide the data item to the requestor.
 2. Thesystem of claim 1, wherein the first partition group and the secondpartition group are located in different time zones.
 3. The system ofclaim 1, wherein the first partition group and the second partitiongroup are located in different geographic areas.
 4. The system of claim1, wherein the first cache system and the second cache system are partof a distributed caching system that spans multiple partition groups. 5.A method of caching data in a distributed system, the method comprising:receiving a first request to retrieve a data item at a first cachingsystem at a first partition group; in response to the first request toretrieve the data item and determining that the first caching systemdoes not store the data item: retrieving the data item from a primarystorage, the primary storage separate from the first caching system;updating a first cache associated with the first caching system byadding the data item, the first cache in the first partition group; andtransmitting the data item that was determined to not be stored in thefirst caching system to a second caching system in a second partitiongroup, thereby causing the second caching system to update a secondcache on the second partition group by adding the data item and toprovide the data item in response to a second request to retrieve thedata item.
 6. The method of claim 5, further comprising retrieving thedata item from the first cache in the first partition group.
 7. Themethod of claim 5, further comprising causing the second caching systemto, in response to receiving the second request to retrieve the dataitem, retrieving the data item from the second cache in the secondpartition group.
 8. The method of claim 5, wherein the first partitiongroup and the second partition group are located in different timezones.
 9. The method of claim 5, wherein the first partition group andthe second partition group are located in different geographic areas.10. The method of claim 5, wherein the first cache system and the secondcache system are part of a distributed caching system that spansmultiple partition groups.
 11. The method of claim 5, wherein updatingthe first cache associated with the first caching system comprisesloading the data item into the first cache.
 12. The method of claim 5,wherein updating the first cache associated with the first cachingsystem comprises: finding an entry in the first cache associated withthe data item; and updating the entry in the first cache.
 13. The methodof claim 5, wherein the first partition group is a data center and thesecond partition group is a data center.
 14. The method of claim 5,wherein the first partition group comprises multiple virtual machineinstances on a same computer.
 15. Non-transitory computer storage havingstored thereon instructions that, when executed by a computer system,cause the computer system to perform operations comprising: receiving afirst request to retrieve a data item at a first caching system at afirst partition group; in response to the first request to retrieve thedata item and determining that the first caching system does not storethe data item: retrieving the data item from a primary storage, theprimary storage separate from the first caching system; updating a firstcache associated with the first caching system by adding the data item,the first cache in the first partition group; and causing one or moreadditional caching systems located in one or more additional partitiongroups to update caches located at the one or more additional partitiongroups by adding the data item that was determined to not be stored inthe first caching system and to provide the data item in response to oneor more second requests to retrieve the data item.
 16. Thenon-transitory computer storage of claim 15, wherein the first partitiongroup and the one or more additional partition groups are located indifferent time zones.
 17. The non-transitory computer storage of claim15, wherein the first partition group and the one or more additionalpartition groups are located in different geographic areas.
 18. Thenon-transitory computer storage of claim 15, wherein the first cachesystem and the one or more additional cache systems are part of adistributed caching system that spans multiple partition groups.
 19. Thenon-transitory computer storage of claim 15, wherein updating the firstcache associated with the first caching system comprises loading thedata item into the first cache.
 20. The non-transitory computer storageof claim 15, wherein updating the first cache associated with the firstcaching system comprises: finding an entry in the first cache associatedwith the data item; and updating the entry in the first cache.
 21. Thenon-transitory computer storage of claim 15, wherein the first partitiongroup comprises a data center and the second partition group comprises adata center.
 22. The non-transitory computer storage of claim 15,wherein the first partition group comprises multiple virtual machineinstances on a same computer.